AGREEMENT ON MRI DIAGNOSIS IN COMPRESSIVE MALIGNANT VERTEBRAL FRACTURES

ABSTRACT Objective: Verify interobserver and intraobserver agreement of malignant compressive vertebral fractures (MCVF) diagnosis using magnetic resonance imaging (MRI). Methods: We retrospectively included a lumbar spine MRI of 63 patients with non-traumatic compressive vertebral fracture diagnoses. Each lumbar vertebra was classified as: without fracture, with fracture of benign characteristics, or with fracture of malignant characteristics. Two medical residents in radiology, one musculoskeletal radiologist fellow, one musculoskeletal radiologist, and two spine surgeons evaluated MRI exams, independently and blindly. Each observer performed two readings, with a 15-day interval between evaluations. A simple Kappa coefficient was used to calculate the intra and interobserver agreement. The reference standard classification was based on bone biopsy or clinical, and imaging follow-up of at least two years, for diagnostic performance analysis. Diagnostic performance was assessed by calculating sensitivity, specificity, accuracy, and positive and negative predictive values with a 95% confidence interval (CI). Results: We observed substantial to perfect intraobserver agreement (kappa: 0.80 to 1.00) and substantial interobserver agreement (kappa 0.64 to 0.77). In general, the sensitivity for the detection of MCVF was moderate, except for the second-year radiology resident that achieved a lower sensitivity. The specificity, accuracy, and negative predictive value were high for all observers. Conclusion: MCVF diagnosis using MRI showed substantial interobserver agreement. The second-year medical resident achieved lower sensitivity but high specificity for MCVF. Regarding the seniors, there was no statistical significance between spine surgeons and the musculoskeletal radiologist. Level of Evidence III; Diagnostic.


INTRODUCTION
The occurrence of non-traumatic fractures in the thoracic and lumbar spine segments is a common problem, especially in elderly individuals, with osteoporosis being the leading cause of these fractures. 1,2 On the other hand, the spine is also a frequent site of metastatic disease, which can result in pathological fractures. 3 The etiological diagnosis of these fractures is fundamental since it can modify the therapeutic planning and the prognosis of patients. Failure to diagnose metastatic lesions, or a delay in the diagnosis, may compromise optimal treatment and lead to worse clinical outcomes. 4 Magnetic resonance imaging (MRI) is considered the gold standard imaging for the differentiation between pathological fractures associated with metastatic lesions and benign osteoporotic fractures. [5][6][7] Several MRI signs are described as useful in distinguishing between these fractures, but the interpretation of these signs is subjective, and there are no decisive criteria for diagnosis. 5 Thus, accurate diagnosis among such fractures based on imaging examinations, even considering experienced radiologists and spinal surgeons may generate doubts. 4 There is scarce literature on the intraobserver and interobserver agreement in the diagnosis of malignant vertebral compressive fractures (MVCF). To the best of our knowledge, it is not well known whether the medical specialty interferes with the diagnostic performance of MVCF. Thus, the objectives of the present study were to verify intra and interobserver agreement regarding MVCF detection, and to investigate the diagnostic performance of these fractures, comparing radiologists and spine surgeons.

Type of study, population and ethical aspects
This is a retrospective and transverse observational diagnostic study using a database of spine MRI approved by the Institutional Review Board of Ribeirão Preto Medical School, University of São Paulo, Ribeirão Preto, Brazil (Process HCRP n o 13568/2016). Only patients with a previous diagnosis of compressive vertebral fracture secondary to bone insufficiency or malignant disease were included. The cases were searched using the keywords "fracture", "malignant", "osteoporotic" and "osteoporosis" in the final impression of lumbar spine MRI radiological reports in the Radiological Information System. Exclusion criteria were a history of chemotherapy, radiotherapy, or surgery before the MRI study and previous history of spinal trauma or infection. The MRI files were anonymized and the confidentiality of the patients' identity guaranteed in all the study processes. A total of 220 patients who had the potential to participate in the study were initially enrolled, but after applying the previously mentioned exclusion criteria, 63 patients were included in the study. Lumbar spine MRI of all patients was acquired on the same equipment (1.5 Tesla, Achieva, Philips, Eindhoven, Netherlands). All patients had their diagnosis confirmed either by a histopathological diagnosis or by a clinical and imaging follow-up for at least two years, in cases in which there was no clinical indication of biopsy.

Image analysis
Two evaluations of the exams in the DICOM format were carried out by two medical residents in radiology second and third year residents (2ndRR and 3rdRR respectively), one musculoskeletal radiologist fellow (MSKRF), one musculoskeletal radiologist with three years of experience in this area (MSKR) and two spine surgeons. Radiology Medical Residents were at the end of their respective training years. The two spine surgeons with seven and eight years of experience were denominated SS7 and SS8, respectively. The observers performed independent and blind evaluations, without knowledge of the final diagnosis of each patient and data on the etiology of the vertebral fracture, as well as without information on the other assessments performed by other physicians. The evaluation was performed with all the images acquired in the clinical routine, with T2-weighted sagittal, axial and coronal images and T1-weighted sagittal plane images. In some cases, additional sequences were used, such as fat saturation sequences and post-contrast MRI sequences. All observers performed the second evaluation of the images, with a minimum interval of two weeks between the assessments, to investigate intraobserver agreement. In the cases of spine surgeons, before the second evaluation, they were exposed to some scientific articles addressing the issue of diagnostic differentiation between benign osteoporotic and malignant fractures, 7,8 and it is possible to verify the diagnostic performance before and after the knowledge deepening in the theme. For the analysis of the interobserver agreement, only the first assessment of all observers was used. The analysis considered only the five lumbar vertebrae of the patients included in the study. In cases that there were lumbosacral transition vertebrae, these were considered as L5 vertebra to make their identification homogeneous. The lumbar vertebral bodies were numbered from caudal to cranial, and each lumbar vertebral body diagnosed as benign osteoporotic fracture, malignant fracture and absence of fracture.

Statistical analysis
All analysis was performed with SAS software (SAS Institute Inc., Campus Drive Cary, NC, USA) version 9.0. The intra and interobserver agreement were calculated using the simple Kappa coefficient, calculating the confidence intervals (CI) of 95%. We consider the classification proposed by Landis and Koch 9 in which the Kappa value less than 0.00 is considered poor, between 0 and 0.2 defines slight agreement, between 0.21 and 0.4 fair agreement, between 0.41 and 0.6 moderate agreement, between 0.61 and 0.8 substantial agreement and between 0.81 and 1 almost perfect agreement. Diagnostic performance was defined calculating sensitivity (SEN), specificity (SP), positive predictive value (PPV), negative predictive value (NPV) and accuracy (ACU) in the diagnosis of malignant fractures, with the respective confidence intervals (CI) of 95%.

Intraobserver agreement
The analysis of intraobserver agreement showed almost perfect agreement between the two evaluations performed by almost all observers (Table 1). Only the surgeon with eight years of experience (SS8) presented substantial intraobserver agreement (Kappa = 0.80) and the second-year radiology resident achieved perfect intraobserver agreement (2ndRR) (Kappa = 1.00). There was no statistically significant difference of intraobserver agreement regarding the comparison between the speciality type and degree, except for the 2ndRR. Among the spine surgeons, an intraobserver agreement was higher for the surgeon with seven years of experience (SS7), but with no statistical significance. Comparing the intraobserver agreement between radiologists and surgeons, it was higher for the senior musculoskeletal radiologist.

Interobserver Agreement
Interobserver Agreement analysis used the first evaluation of all observers ( Table 2). We did not identify a statistically significant difference between the interobserver agreements of different specialties. The interobserver agreement among all observers, considering the confidence intervals, presented results ranging from moderate to almost perfect. Among the radiologists, the highest interobserver agreement was between the 3rdRR and MSKRF with results ranging from substantial to almost perfect. Among the surgeons, the interobserver agreement ranged from moderate to substantial. Among radiologists and surgeons, in general, the interobserver agreement ranged was substantial on (average from 0.62 to 0.77), and the highest agreement occurred between the 3rdRR and SS8 observers.

Diagnostic Performance
All values calculated for the evaluation of diagnostic performance are shown in Table 3. Regarding the sensitivity, observers 2ndRR and SS7 presented lower mean values in their first evaluation, but without statistical significance. The SS7 observer showed a significant increase in sensitivity in the second assessment (after the study of the academic articles), still maintaining a mean sensitivity value lower than the other observers, but without statistical significance. The mean values of the sensitivity of the other observers were moderate and similar. The specificity, accuracy and negative predictive value were high for all observers. As shown in Table 3, the PPV ranged from moderate to high values, with no statistically significant difference between the observers.

DISCUSSION
Despite the importance of accurate diagnosis in spinal fractures between benign osteoporotic and malignant, especially in older individuals, several studies suggest that the determination of specific criteria for such a differential diagnosis can be difficult. 5,7,10 Nevertheless, the influence of the medical specialty on the performance of this differential diagnosis has not been evaluated. Comparing radiologists with spine surgeons, in addition to the experience of these specialists was the objective of the present study. The intra and interobserver agreement rate were also verified and evaluated according to the medical specialty and professional experience time.
In the present study, we did not identify a statistically significant difference in diagnostic performance when distinguishing benign osteoporotic from malignant vertebral fractures between the different training levels. The only exception was the low sensitivity obtained for the second year radiology resident that has been less exposed to MRI training. In general, the specificity (always higher than 90%) was considerably higher than sensitivity in the diagnosis of malignant fractures, so, when evaluated by radiologists and spine surgeons, the observation of signs of malignancy are usually consistent with such diagnosis. Kato et al. also observed specificity higher than 85% in 200 fractures evaluated by two spine surgeons. 7 In the case of spine surgeons, after reading the academic articles on the subject, we noticed that the acquired knowledge was associated with the improvement in diagnostic performance. Therefore, it was observed that the results are better when there is previous knowledge about the characteristic signs in the imaging examinations of benign osteoporotic or malignant vertebral fractures. 7,8 Regarding the agreement rate, the rates obtained were classified as being substantial to almost perfect for all observers participating in the study. This would suggest a high reproducibility of the evaluation using the diagnostic characteristics commonly attributed to benign osteoporotic and malignant vertebral fractures. Several authors have reported that the interpretation of the characteristic signs for the differential diagnosis is subjective and, thus, the interobserver reproducibility could be quite variable among the studies. 5,7,[11][12][13] A striking feature of the present study was that the most frequent diagnostic errors were mainly related to cases diagnosed with multiple myeloma. This greater difficulty for diagnosis in cases of fracture associated with multiple myeloma is in agreement with the literature, being that these fractures frequently present characteristics compatible with benignity in MRI. 14 In the study by Leucovet et al., it was observed that 67% of fractures associated with multiple myeloma had MRI signs characteristic of benign vertebral fractures. 14 Multiple Table 2. Interobserver agreement, assessed by the simple Kappa coefficient and its respective confidence intervals (95%).   Myeloma patients comprised 40% of our cases with vertebral fractures secondary to metastasis, and this may explain why average sensitivity achieved for MVCF was just moderate. Because of the difficulties described here in the differential diagnosis between benign osteoporotic and malignant vertebral fractures, some authors sought to develop instruments that could improve such diagnosis. Recently, a score composed of MRI signs was presented to assist the determination of vertebral fractures by metastases. 7 The authors reported that with the use of the score described by them, they obtained an accuracy rate of 96.6% in the diagnosis of metastatic malignant fractures. In the present study, in which the observers did not use any specific instrument for diagnosis, the mean accuracy was 90.1%, and the musculoskeletal radiology fellow obtained 94% accuracy. In the article in which the META score was described to assist in the diagnosis of MVCF, 7 cases with multiple myeloma were excluded, while in the present study they were included. More recently, Computed Assisted Classification and Machine-learning techniques have been applied to MVCF diagnosis on spine MRI, with promising results. [15][16][17] Features derived from Fourier and wavelet transforms, together with the fractal dimension, achieved up to 94.7% of correct classification with the area under the receiver operating characteristic curve (AUC) reaching 0.95. 15 Neural networks achieved AUC of 0.97 in distinguishing between normal and fractured vertebral bodies, and 0.92 in discriminating between benign and malignant fractures. 16 A combination of different classification models composing the ensemble to make the final class assignment reached an average value of AUC = 0.94. 17 Future studies are necessary to confirm artificial intelligence usefulness in the diagnosis of MVCF with external validation.
The present study presents limitations that deserve mention. First, this is a retrospective investigation. Another limitation is that not all cases had histopathological confirmation of the fracture etiology. Cases strongly suggestive of MVCF were biopsied, but in cases that MRI signs favored a benign vertebral fracture, patients were followed clinically and with follow up MRI. All cases had a minimum clinical follow-up of two years from vertebral fracture detection to minimize the risk of including fractures initially identified as osteoporotic fractures but representing a false negative. The classical studies on MVCF also had similar limitation because, in the clinical practice, the biopsy of the vertebral compression fracture is not always necessary or indicated. Therefore, the reference standard for the presence or absence of metastases was based on a best valuable comparator, based on clinical, histologic, biologic, and imaging data. [18][19][20][21] CONCLUSION MCVF diagnosis using MRI showed substantial interobserver agreement. The second-year radiology resident achieved lower sensitivity but high specificity for MCVF. Regarding the seniors, there was no statistical significance between spine surgeons and the musculoskeletal radiologist.

Declarations
Funding: "This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -Brasil (CAPES) -Finance Code 001" Availability of data and material (data transparency): The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.